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I .  INTRODUCTION 

TWO  principal  methods  have  been  developed  for  inference  about  extreme 
quantiles  or  tails  of  distributions.  The  older  is  beised  on  extreme  value 
distributions  for  subsample  maxima.  Statistical  aspects  of  this  have  been 
developed  by  Prescott  emd  Walden  (1990),  Hosking  (1984)  and  Smith  (1985), 
amongst  others.  The  other  method  is  beused  on  the  Generalised  Pareto 
distribution  for  exceedances  over  high  thresholds.  This  waus  introduced  by 
Pickautds  (1975)  euid  developed  by  Davison  (1984),  Smith  (1984)  and  Hosklng 
and  Wallis  (1987),  though  there  is  a  much  larger  literature  of  related 
techniques  in  hydrology  (NERC  1975). 

A  major  issue  with  both  of  these  methods  is  that  certain  limiting 
distributions  are  used  as  statistical  models.  These  limiting 
distrihtitions  are  not  exact  in  finite  samples,  and  so  there  are  two 
sources  of  error.  One  source  is  the  usual  variance  of  estimators  of  the 
model.  The  other  source  is  a  bias  created  by  the  fact  that  the  assumed 
model  is  not  exact.  Ihis  creates  a  biais  versus  variemce  conflict  of  the 
kind  familiar  from  density  estimation,  nonparainetric  regression,  and  other 
statistical  techniques  involving  smoothing. 

Previous  studies  of  this  feature  have  been  Davis  and  Resnlck  (1984) 
and  smith  (1987)  on  threshold  methods  and  Cohen  (1987,  1988)  on  classical 
extreme  value  methods .  Joe ' s  ( 1987 )  results  partly  unify  the  two  methods . 
There  is  a  long  literature  on  estimating  an  index  of  regular  variation,  of 
tdtich  Hall  (1982)  and  Hall  «md  Welsh  (1984,  1985)  have  done  most  to  make 
the  connection  with  smoothing  problems.  Recent  references  on  this  Include 
Csorgo,  Horvath  and  Revesz  (1987),  Reiss  (1987)  and  Bierlant  and  Teugels 
(1987).  Simulation  results  have  been  given  by  Boos  (1984),  Gomes  (1986) 


and  Joe  ( 1987 ) . 


One  reason  why  the  literature  on  this  topic  has  been  so  disjointed  is 
that,  althou^^  there  has  been  much  literature  on  rates  of  convergence  in 
extreme  value  theory,  no  unified  approach  has  emerged.  In  particular,  the 
three  domains  of  attraction  tend  to  be  treated  entirely  separately,  so 
that  the  literature  is  three  times  as  long  as  it  should  be.  Recently, 
however.  Smith  (1988)  has  proposed  a  new  approach  bringing  together  the 
three  domains  of  attraction.  In  the  present  paper,  my  aim  is  to  show  how 
this  new  approach  leads  to  solicit  (approximate)  expressions  for  the  biM 
and  variance  of  estimators  of  extreme  quantiles.  Ihe  principal  features 
%diich  distinguish  this  approach  from  its  predecessors  are  that  the  three 
domains  of  attraction  are  dealt  with  together,  and  that  the  method  ai^lies 
to  a  very  wide  variety  of  estimators.  To  illustrate  this,  four  paurticular 
estimators  are  studied  in  detail.  The  results  may  therefore  be  used  to 
compare  one  estimation  method  with  another,  as  well  as  providing  a 
theoretical  resolution  of  such  questions  as  choosing  the  best  threshold. 
They  may  also  suggest  automatic  or  adaptive  techniques  of,  for  example, 
choosing  the  threshold,  but  I  do  not  consider  this  aspect  in  any  detail. 
For  this  rezison  the  results  should  be  viewed  as  providing  general 
qualitative  guidelines  rather  than  determining  a  specific  procedure  for  a 
particular  set  of  data. 

The  orgemisation  of  the  paper  is  as  follows.  Section  2  revievre  the 
development  of  Smith  ( 1988 )  on  probedjilistic  approximations  to  extreme 
value  distributions.  Sections  3-6  develop,  in  turn,  biais  and  variance 
approximation  for  four  previously  studied  methods  of  estimating  extreme 
quantiles: 

(i)  estimation  beused  on  the  exponential  distribution  for  exceedances 
over  high  thresholds, 

( li )  estimation  beuied  on  the  Generalised  Pareto  distribution  for 
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exceedances, 

(ill)  estimation  based  on  the  Gunibel  distribution,  and 
(iv  estimation  based  on  the  Generalised  Extreme  Value  distribution. 
Section  7  concerns  the  application  of  these  results  to  theoretical 
comparisons  among  the  procedures,  in  particular  giving  new  results  for  the 
comparison  of  <i)  euid  (iii).  Finally,  in  Section  8  we  give  numerical 
results  aiKi  comparisons  with  existing  simulations. 


2.  EXTREME  VALUE  APPROXIMATIONS 

‘Ihe  present  section  is  a  summary  of  the  results  of  Smith  ( 1988 ) . 


The  starting  point  is  the  representation 

X 

f 

X 


-log  P(x)  ■  exp 


X  <  X  <  X 
* 


(2.1) 


where  )  is  the  range  of  the  distribution.  This  is  for  the  cleissical 


approach  based  on  extreme  value  distributions;  for  the  threshold  approach 
we  replace  -log  P(x)  by  l-P(x)  in  (2.1).  The  two  representations  lead  to 
the  same  results  except  for  a  few  distributions,  (e.g.  uniform, 
eiqx>nential )  for  which  the  convergence  is  very  rapid. 


From  (2.1)  we  deduce 
-log  F(  u-<-x;^(  u ) ) 


-log  P(u) 


-  ejq* 


1-1 

-  |i+x4>'(y))j 


4.(u) 


4>(  u+s<t)( 
-l/(t>’(y) 


-  du  [ 

u))  J 


(2.2) 


for  some  y  between  u  and  u-fx^(u),  by  Taylor  expansion  and  the  mean  value 


theorem.  This  assumes  4>  continuously  differentiable. 
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Suppose  new 

lim  <t>’(x)  -  (-00  <  <  00)  (2.3) 

x-»x* 

This  is  essentially  von  Mises*  condition  for  the  existence  of  a  limiting 
extreme  value  distribution.  We  now  have  three  levels  of  approximation. 

First  aoDroximat ion  Replace  4>*(y)  in  (2.2)  by  y^.  Now  if  a^j,  b^  are 
defined  such  that  -log  P(b^)  =  n“^,  a^^  =  <^(bjj),  then  replacing  u  by  b^  in 
(2.2). 

-i/y© 

lim  -n  log  FCa^x+b^)  -  (l+yQ*) 

n->ao 

and  hence 

•"l/y 

lim  p"(anX+bn)  -  exp{-(l+yjjX)  ,  (2.4) 

n4oD 

valid  wherever  1  +  y^x  >  0.  'Hiis  is  the  Generalised  Kxtreine  Value 
distribution.  In  the  special  case  y^-a  O,  the  right  hand  side  of  (2.4) 
reduces  to  exp(-e''*),  comnonly  called  the  Guinbel  distribution. 

Second  approximation  Replace  J>'(y)  in  (2.2)  by  (>'(u).  Then,  defining  et,, 
and  b^  ais  above,  and  y^  ^'(b^),  we  have  the  aproximation 

F"(a„x+b„)  ~  ejq)|-(l+ynX)  (2.5) 

which  in  general  improves  on  (2.4).  this  is  the  penultimate 
approximation.  In  the  ceuse  y^-  O,  use  of  the  penultimate  approximation 
generally  improves  the  rate  of  convergence  (Cohen  1982,  Gomes  1984).  This 
is  not  so  «dien  y^^  O,  but  even  here  the  quality  of  approximation  for 
moderate  n  is  generally  improved  by  the  penultimate  approximation  (Gomes 
and  Pestana  1986,  Smith  1988). 
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CTiird  aPDroxlmation  The  approximation  may  be  further  improved  by 
expanding  However,  some  care  is  needed  to  do  this  in  a  sufficiently 

general  way.  Simple  Taylor  expansion  does  not  suffice.  Smith  (1988) 
proceeded  under  the  assumption 

♦  '(u){it>'(u+w«t>(u))-<t>*(u)} 

J  im  -  «  c  (2.6) 

uTx*  g(u)hp(l+w«t>*(u)) 


for  each  w  such  that  1  y^w  >  0.  Here  p  and  c  are  real  nundbers,  g  is  a 
non-negative  function,  and  is  defined  by 


x^-1 

P 


P  O  , 


log  X  ,  p  »  0  . 


(2.7) 


This  assumption  covers  most  common  distributions.  For  instance: 

(i)  Assume  -log  P(x)  -  Cx~“(  1  +  Dx~^  +  o( x“^ ) )  as  x-»®.  This  Includes 
such  cases  as  Pareto,  Cauchy,  t  and  P  distributions.  In  this  case 
P  “  -(3/  g(u)  -  u”^,  c  -  -0/3^  a~^(/3-l). 


( il )  Assume  x*  <  a  and 

-log  P(x)  “  C(x*-x)®  (l+D(x*-x)^  +  o((x*-x)^)} 

as  X  T  X*.  Then  y^  ••  -a~^,  p  -  p,  g(u)  **  (x*-u)^,  c  -  -D/3^  a”^(P+l). 
This  includes  most  distributions  in  the  weibull  domain  of  attraction. 


( ill )  Assume  y^*  0  under  what  Cohen  ( 1982 )  called  claiss  N,  which  includes 
most  distributions  in  the  Gundael  domain  of  attraction.  In  this  case  it  is 
valid  to  malce  a  Taylor  expansion  of  4>.  Equation  (2.6)  holds  with 
arbitrary  p,  g(u)  -  ^u)|^”(u)|,  c  -  ±i.  This  case  covers  the  normal, 
lognormal  and  gamma  distributions,  amongst  many  others. 
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In  all  three  cases,  additional  differentiability  asutumptinns  are 
involved,  made  precise  in  Smith  (1988). 


Substituting  from  (2.6)  into  (2.2),  defining  a^,  b^,  as  before  and 
r^  ~  g(bp),  routine  but  tedious  manipulations  lead  to 


-Vy„ 


P"(a„x+b„)  ~  exp[-(  l+icy„)  "  {l+or„H  (x,y„))l 


(2.8) 


Whenever  1  +xyj|  >  o.  Where 


Hp(x.y) 


h  ( l+xy  )+ph_.  (  1+xy )-(  p+1 ) log(  1+xy ) 

- * - — — 1* - r  ltxy>0 

p(p+l  )y3 


(2.9) 


L  O 


l+xy<0  . 


The  rates  of  convergence  for  the  three  approximations  are  ©(y,,-  y^^) 
for  (2.4),  0(rj,)  for  (2.5)  and  0(^1,)  for  (2.8).  Under  mild  additional 
assun^ions,  these  rates  hold  pointwise,  uniformly  over  all  x,  in  total 
variation  norm,  and  finally  in  Bellinger  distance.  The  Bellinger  distance 
between  two  distributions  with  densities  f  and  g  is  defined  by 

.1  ±  2  s 

H(f,g)  -  c|{f*(x)  -  g*(x))  dx]*  . 

The  use  of  Bellinger  distance  in  the  context  of  extreme  value 
approximations  was  first  proposed  by  Reius  (1984)  and,  as  will  be  seen  in 
later  sections,  greatly  simplifies  the  proofs  of  statistical  results. 

In  the  case  of  (2.8),  there  is  a  small  problem  in  that  the  right  hand 
side  may  not  be  a  distribution  function.  In  this  case,  however,  a  slight 
modification  does  converge  in  Bellinger  distance  at  rate  o( r^ ) .  The 
principal  additional  assumption  for  this  is  that  y^  >  -1/2,  an  assumption 
Which  is  natural  in  view  of  the  regularity  conditions  for  meucimum 
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likelihood  estimation  (Smith  1985). 

For  the  threshold  approach,  -log  F  is  replaced  by  l-F  In  (2.1)  and 
(2.2).  we  then  have  three  approximations  for  {l-P(u+xij>(u))/{l-F(u)) ! 


-l/X 

First  approximation  (l+y^x)  ,  for  x>0,  l+yQX>0,  ~<o<y^<co.  This  is 

the  Generalised  Pareto  tail. 

“l/Vu 

(l+y^x)  where  y^  <l>*(u). 

(l+y^x)  (1  +  cr^^Hp( X, y^ ) )  (2.10) 

The  errors  of  these  approximations  are,  respectively, 

0(r^),  o(r^),  and  these  are  valid  for  Hellnger  distance  as  before. 


3.  ESTIMATION  BASED  ON  AN  EXPONENTIAL  TAIL 

In  this  section  we  consider  the  estimation  of  em  extreme  quantile 
under  the  assumption  that  the  conditional  distribution  of  exceedances  over 
a  high  threshold  is  exponential.  By  the  previous  section,  this  is  the 
same  as  a  Generalised  Pareto  approximation  with  y^-  o.  The  method  is 
effectively  the  same  as  that  of  weissman  (1978). 

Ihe  general  framework,  here  emd  throughout  the  rest  of  the  paper,  is 
that  we  have  N  independent  observations  from  a  common  unknown  distribution 
function  F,  and  we  are  interested  in  solving  l-F(q)^  for  given  small  p. 
Fix  a  high  threshold  u,  let  k  denote  the  (random)  nuniber  of  exceedances  of 


Third  approximation 


tdiere  r, 


u 


g(u). 
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u,  and  let  denote  the  excesses  over  u.  Given  k,  we  have 

Yi, ...,Y^  independent  with  coomon  distribution  function 


F(  u+y  )-F(  u ) 
i-F(u) 


y  >  0 


(3.1) 


under  the  First  Approximation.  Here  we  assume  O. 


The  method  that  then  suggests  itself  is  to  estimate  ©  *»  i-F(u)  by  k/N 
£uid  then  fit  an  ejqponential  distribution  to  Y^,...,Yj^,  estimating  by  the 

sample  mean  Y.  Ibese  approximations  lead  to  the  estimator 

q  -  u  +  Y  (3.2) 

assuming  q  >  u.  Except  for  changes  in  notation,  this  is  identical  to 
equation  (4.3)  in  Weissman  (1978),  except  that  weissman  took  k  rather  than 
u  as  predetermined. 


An  approximation  for  the  vari2U:ice  of  q  may  be  made  by  the  usual  delta 
method,  assuming  the  eiqwnential  distribution  of  the  excesses  and  an 
approximate  Poisson  distribution  for  k.  This  leads  to 

var(q)  ~  -  logr(  — )  +  -  ~--(l+n^)  (3.3) 

k  Np  k  N© 

where  t)  «  log(e/p). 


The  next  step  is  to  approximate  the  bias  of  q  due  to  misspecification 
of  the  ejqxsnentlal  distribution.  Tn  view  of  the  results  described  in 
Section  2,  the  obvious  thing  to  do  is  to  use  the  next  order  of 
approximation,  i.e.  the  Generalised  Pareto  distribution,  in  place  of 
(3.1).  In  other  words,  we  replace  the  right  hand  side  of  (3.1)  by 
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1 


fl  *  "-) 

I  4>(u)J 


-1/y 


(3.4) 


ti^re  y  ■-  y^  -  ^*(u).  Now,  the  mean  of  a  Generalised  Pareto  distribution 
with  parameters  y  zind  ^  is  (l-y)~^(^,  zissuming  y<l.  Moreover,  under  the 
Generalised  Pareto  model  the  true  quantile  would  be  u  t  i^y~^(  e^-1 ) . 
Hence 

A  f  n  e^^l  1 

E(q-q)  «■  (t>{ - +  o(y>[ 

U-y  y  J 

“  ♦yq(l-T\/2  +  0(1)}  .  (3.5) 


Combining  (3.3)  and  (3.5)  leads  to  an  approximation  for  mean  squared  error 


MSE(q) 


4>^(u) 


l+q" 


NO 


(3.6) 


So  fau:,  this  development  has  bMn  heuristic.  I  shall  now  outline  a 
framewor)c  within  which  these  formulae  appear  as  rigorous  auiymptotic 
results.  Ihe  intention  of  this  is  to  clarify  the  status  of  the  preceding 
approximations,  as  well  as  indicating  a  starting  point  for  possible 
further  theoretical  development. 


Suppose  N  -♦  «,  u  «  U(f  ->  X*,  0  *  ©II  ^  O,  N©^  -4  »,  T)  =*  ■*  tTq,  Where 

0  <  <  OB.  Let  4i||  -  4KU|,),  y„  -  ©'(Un),  Pj,  -  ©>|eiqK -hn ) •  Suppose  also 

i 

Vn<'*®N>*  ^ 


where  8  is  finite  (possibly  O).  Define 


1 


2 


^-(14 

N©|, 


% 


2). 


(3.8) 


Consider  the  following  hypothetical  model.  For  each  N,  1Cf|  has  a  binomial 
distribution  with  parauneters  N,  ©u  and,  given  kn  k,  the  excesses 
Yi,...,Y]^  are  independent  Generalised  Pareto  with  parameters  ©||,  y|,. 
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4.  ESTIMATION  BASED  ON  A  GENERALISED  PARETO  TAIL 

method  of  Section  3  will  now  be  extended  to  one  in  which  the 
exponential  distribution  (3.1)  is  replaced  by  the  Generalised  Pareto 
distribution  for  the  purpose  of  constructing  the  estimator.  This  was  the 
method  studied  in  det2iil  by  Smith  (1987),  but  the  new  probabilistic 
approximations  of  Section  2  allow  the  results  to  be  developed  in  a  much 
more  coherent  fashion. 


Suppose,  then,  that  (3.4)  is  used  in  place  of  the  approximation  in 
(3.1),  and  that  estimators  y,  ^  are  obtained  for  the  parameters  y, 
Throughout  this  paper  it  will  be  assumed  that  maximum  lUcelihood  is  used 
as  the  method  of  estimation,  though  there  is  nothing  to  stop  similar 
calculations  being  made  for  other  estimators,  such  as  the  Probability 
Weighted  Moments  estimators  of  Hosicing  and  Wallis  (1987).  As  before,  the 
exceedance  probability  e  is  estimated  by  k/N,  and  this  leads  to  a  qpjantlle 
estimate 

q  »  u  +  $y“^(exp(y  tj)  -  i)  where  ti  -  log|-^j  .  (4.1) 

In  this  section  y^  (from  (2.3))  is  2u:bitrary,  but  we  assume  y^  >  -1/2  for 
regularity. 


Applying  the  delta  method,  with  an  approximate  Poisson  distribution 
for  ]c  and  the  covariemce  matrix  of  (y,  $)  derived  from  the  Fisher 

information  matrix  (Smith  1987),  we  obtain  the  approximation 

-  .V  l+y  2  2  2  02e2>^ 

var(q)  a  (20  -  200^02  +  (l+y)D2)  +  - - -  (4.2) 

N9  NO 

where  y  -  y^  -  4>*(u),  0  -  (Xu),  tj  -  log(e/p)  and 


dq  e'^'^l 
94  y 


9q 

9y 


fr^yh  eV^l 


(4.3) 


Define  ^  by  (3.2)  with  suffices  N  added  where  appropriate.  Also  let 
denote  the  exact  quantile  under  the  hypothetical  model,  l.e. 

%  -  “m  +  ♦M(exP<yN^M)  -  II/Vm  • 

For  this  model  it  is  eaisy  to  make  all  the  approximations  rigorous  and  it 
follows  that 

-  q^l)  <  X)  ->  ♦  X  -  erjojl  -  (3.9) 

Where  e  is  the  standaird  normal  distribution  function.  Here  denotes  the 
probability  measure  of  the  hypothetical  model. 

ihe  hypothetical  model  differs  from  the  true  model  only  in  the 
distribution  of  excesses  over  the  threshold.  By  the  results  of  Section  2, 
the  Bellinger  distance  between  the  true  and  Generalised  Pareto 
distributions  is  of  o(y||).  Horeover,  the  Bellinger  distance  between  two 
product  measures  grove  in  proportion  to  the  square  root  of  the  nuaiber  of 
components  (Reiss  1984),  so  the  Bellinger  distance  between  P^  and  the  true 

j_ 

probability  measure,  Pj,  say,  is  of  o(k^*  y^).  By  (3.7),  this  tends  to  0. 
Since  Bellinger  distance  doadnates  total  variance  distance,  it  follows 
that  P^A(|)  -  O  for  any  sequence  of  events  (Ajj).  Hence  (3.9) 

remains  true  with  P||  replacing  P^  and  the  true  quantile  q^j  replacing  qjj. 

What  this  says,  in  effect,  is  that  probability  calculations  baised  on 
(3.3),  (3.5)  emd  the  noimal  distribution,  are  uymptotically  correct. 

This  avoids  the  direct  question  of  whether  (3.6)  itself  is  valid  as  an 
asymptotic  approximation,  something  which  involves  additional 
moment-convergence  technicalities  of  the  kind  developed  in  Section  2.5  of 
Goldie  and  Smith  (1987). 
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To  approximate  the  bieus  of  q,  the  method  is  to  use  the  higher  -order 
approximation  (2.10).  Iiet  us  revnrite  this  in  the  form 

j-VV/ 


P(u+y)  -  F(u)  „ 

- -  . 

l-P(u) 


1  + 


4> 


1  (‘  *  '“'4»  •’']}  ■ 


over  the  range  on  vAiich  it  is  a  valid  distribution  function. 

The  bi^ls  of  $  2uid  y  are  calculated  by  the  method  given  in  Section  2 
of  Smith  (1987),  idiich  leads  to  the  approximation 


■4>  -  4» 

A 

ly  -  y 


z  M  ^  E 


aiog  g(Y) 


d<^ 

aiog  g(Y) 


ay 


(4.5) 


v/here  g  is  the  Generalised  Pareto  density,  M  the  Fisher  Information 
matrix,  and  the  eiqiected  values  are  calculated  under  the  model  (4.4).  Now 


-1 

N 

-  (i+y) 

20^  -0 

”0  ity. 

Slog  g(y) 

1  1 

fi  +  il  [i 

do 

oy  0 

Ir  1 1 

aiog  g(y) 

ay 

-  ^  log 

yy 

0 


1-1 


(4.6) 


The  range  of  y  is  0<y<«  if  y>0,  0<y<-y“^  if  y<0. 


Suisse  Y  has  distribution  function  (4.4).  Define  u  to  be 
(l+yY/4»)"^/’'.  NOW 

1 

E((l+yY/0)*^}  -  e{o~>^}  -  1+yr  |  p[o<u}  du  . 

O 


Writing  out  the  distribution  function  of  U  and  using  the  identity 


j  u~'^  h_(u“'^)du  - - ^ - 

0  <l-yr-yp  )(l-yr) 


we  deduce 


_L  _ 21 _ 

[I  ♦  J  J  1-yr  ( l-yr-yp  )( l-yr+y )( l-> 


•nie  conditions  required  for  this  are  yr<i,  yr+yp<l.  Taking  the  special 
cases  r-l,  r— l  and  the  limit  we  deduce 


^raiog  g(Y)|  ^  __ 

1  3^  1  ^ 


l+y-yp)<  l+y)(  l+2y) 


raiog  g(Y) 


1  «( 3+2 

]  (l+y^p)(l- 


3+2y-2yp) _ 

( i-yp)(  i+y)(  i+2y) 


Hence  from  (4.S)  we  have 


E(4.  -  4,}  S!  - 


( l+y-yp)(  l-yp) 


E{y  -  y)  » 


e( 2+y-yp) 

( l+y-^p)(  l-yp) 


(4.7) 


Using  (4.7)  with  a  Taylor  eiqtansion  in  (4.1), 

y  (l+y-yp)(l-yp) 


Finally,  for  the  true  quantile  under  (4.4)  we  have 


^  e^-1  _  re^^-i 

q  -  u  +  (i>  -  +  «4»  eyn  H - , 

y  ^  I  y 


y  +  o(e) 


and  hence 


~  .  (2+>^P)D2-40i  ^ 

E(q  -  q)  ~  « - * - i  -  0  e>^„ 

(l+y-yp)(l-yp)  P 


^  e  B(y,p,ir)) 


re’^l  1 

'  ’'J 


(4.8) 


where 


15 


TjeVn  e^-1 

y  y 

-  Hp  '  y]  * 

Conibining  (4.2)  amd  (4.8)  one  can  obtain  an  approximation  for  the  mean 
squared  error.  This  can  be  used  as  a  basis  for  deciding  such  questions 
ais  the  optimal  choice  of  threshold  and  whether,  in  the  cause  of  y^^O,  the 
method  of  this  section  is  preferable  to  the  method  of  Section  2.  Such 
questions  are  discussed  at  length  in  Smith  (1987). 


B(y,p,T)>  - 


( l+y-yp)(  i-yp) 


(2+Y-yp) 


nils  may  all  be  mauie  rigorous  by  a  similaur  process  to  that  in  Section 
3.  Making  the  same  definitions  as  there,  plus  Cjj  ■>  cg(Uj|),  let  denote 
the  approximation  to  the  variamce  of  obtained  from  (4.2)  and  (4.3). 
In  place  of  (3.7)  aissume 


-1 

Itm  W  4^  B(y^,p,Ti  )  -  6  (4.10) 

N4«  " 

where  S  is  finite.  Since  y^^  and  qjif  are  converging  to  constauits,  this 

X 

limit  will  exist  vAtehever  k^*  «(,  converges.  Consider  a  hypothetical  model 
in  which  the  approximation  used  in  (4.4)  is  taken  to  be  exact,  ■men  all 
the  preceding  steps  can  be  deduced  by  standard  aisyinptotic  arguments.  The 
Hellinger  distance  between  the  hypothetical  and  true  models  tends  to  0, 
leeuling  to 


<5»  -  %) 


<  xl  -»  ♦  (X“8) 


(4.11) 


where  P^j  is  the  true  probability  model.  As  before,  this  has  the 
interpretation  that  probability  calculations  made  on  the  basis  of  (4.2), 
(4.8)  aind  the  normal  distribution,  are  etsymptotically  valid. 
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5.  ESTIMATION  EASED  ON  THE  GUMBEL  DISTRIBUTION 

We  now  consider  a  cleuisical  extreme  value  procedure  in  %<hich  the 
sample  of  size  N  is  divided  into  roughly  k  bloclcs  of  size  n,  and  one  of 

the  cleissical  extreme  value  distributions  fitted  to  the  block  maxima.  In 

this  section  we  assume  the  Gumbel  distribution 

G(y)  «  ej5>[-exp{-(y-M)/<T)],  -®  <  y  <  ®  .  (5.x) 

Ohis  model  only  makes  sense  if  it  is  the  correct  limiting  distribution, 
i.e.  if  and  we  assume  that  throughout  this  section. 

If  (5.1)  is  valid  for  G  »  P”  and  we  are  interested  in  the  p-upper 

quantile  of  F,  then 

q  -  4  -  fflog  [-log((l-p)"}l  .  (5.2) 

In  limits  aa  p*0,  np  «  Np/k  ■*  3~^,  (5.2)  simplifies  to  q  »  4+<n), 

which  suggests  the  estimator 

q  »  M  +  on  ,  (5.3) 

where  4,  a  are  the  maximum  likelihood  estimates  beuied  on  a  Gumbel 
distribution  for  block  nuocima. 

In  the  following  r  denotes  the  gamma  function,  ti'  ^  r'/r  the  digama 
function  and  <  the  Riemann  zeta  function.  For  -\|;’(1)  *  0.5772156649  we 
write  y  (Euler's  constant),  in  place  of  the  more  usual  y,  to  avoid 
confusion  with  the  shape  parameter.  Also  <I>’(1)  -  C(2)  -  rr^/6,  «|>"(i)  - 
-2C(3)  -  -2.404113806,  \J»"' ( 1 )  -  6C(4)  -  irVl5  (Abramowitz  and  Stegun, 
1964).  The  inverse  Fisher  information  matrix  is 
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_i  6a 
“  !?■ 


-  +  (i-yr 

6 


l-y 


1-y 


(5.4) 


(Gunibel  1958),  and  hence 


var(q)  ~  ^  (l-y+h)^ 


} 


-  (1.10866  +  0.51404T)  >  0.60793^-^) 

k 


(5.5) 


To  Study  the  bieis  of  q,  we  embed  the  Gunibel  distribution  in  the 
Generalised  Extrema  value  family 


G(y)  -  exp 


(5.6) 


Where,  for  the  approximation  to  p",  we  take  n  »  b,j,  a  “  a^,  y  »  m  in 
Section  2.  The  method  is  similar  to  (4.5),  in  that  the  eiqpected  values  of 
the  derivatives  of  the  Gunibel  log  likelihood,  with  respect  to  n  and  a,  are 
evaluated  under  (5.6). 

If  Y  has  distribution  function  (5.6)  and  density  g(y)  ^  9(y;M>a'#y)/ 
then  we  may  write 

vv 

X  -  H  +  -  (e^^-l )  -  >i  +  oZ  + - +  . . . 

y  2 

where  Z  heus  a  standard  (/i«0,  aa>i)  Gunibel  distribution.  Writing 

fl-ojq?| 


d  1 

—  (log  g(Y;»z,a,o))  -  - 
9ii  a 


■[-  -r)|  - ; 


-Z  y2^e-2 


1-e  + 


o(  y^ )  , 
. 


—  (log  g(Y;*i,<r,0)}  -  - 
d<T  a 


-  (-1  +  Z  -  Ze“^  +  -  (Z^  -  z^e"^  +  Z^  e"^)  +  0(y^)} 
a  2 
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and  using  the  formula 


E{z’^e"®*}  -  (-1)’^  (l+s)  , 


(5.7) 


ve  find,  to  first  order  in  y. 


(a  1  y  »  y  ~  "*71 

—  log  g(y;M,o-,0>[  -  —  r  (2)  =  —  -  -  2y  +  y^ 

dM  J  2a  2a  l6  J 


-  0.41184  -  , 

a 


log  g(Y;M,a.0)j  -  ^  {r"(l)  -  r"(2)  -  r"’(2)) 
=  ^  j2y  -  vj>"(i)  -  2  ajd-y)  -  (i-y)^j 


0.33248  -  . 

a 


A  calculation  similar  to  (4.S)  then  leads  to 


M  -  M 

A 

a  -  a 


~  y  a 


r  0.S420S 
0.30798 


(5.8) 


As  eui  aside,  it  should  be  pointed  out  that  (5.8)  is  identical  with  Theorem 
1  of  Cohen  ( 1987 )  When  the  mathematical  technicalities  of  the  latter  are 
disregarded. 


Application  to  (5.3),  and  comparison  with  the  true  quantile  under  the 
Generalised  Extreme  model,  leads  to  the  approximation 

E(q  -  q)  ~  y  a  (0.54205  +  0.30798ti  -  0.5ti2)  .  (5.9) 

As  in  previous  sections,  these  calculations  may  be  made  rigorous  within  a 
suitable  asymptotic  framework. 
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6.  ESTIMATION  BASED  ON  THE  GENERALISED  EXTREME  VALUE 
DISTRIBUTION 

The  estimator  is  developed  in  the  same  way  as  in  Section  5,  but  the 
Generalised  Extreme  Value  distribution  <5.6)  is  used  in  place  of  the 
Gunibel  distribution  (5.1).  It  is  not  necessary  to  assume  y^'O,  but  we  do 
take  Xq  >  —1.  The  estimator  is  now  taken  to  be 

q  «  U  +  o  y~^  (exp(y  n)-!}  (6.1) 

with  maximum  likelihood  estimates  ii,a,y,  fitted  to  the  k  block  maxima. 


The  Fisher  information  matrix  heus  been  given  by  Prescott  2md  Walden 
(1980)  in  the  form 


"l^  "12  °*13 


m 


”13 


a 


m 


‘33 


(6.2) 


where  m^^j^  -  P,  m,,  » 


r( 2+y)-P 


12 


“IS  =  -  ;  [q  -  ;  ]  '  "»22  -  ^  {l-2r(2+y)+P}  . 


1  ~  {l-r(2+y)}  P 

m-  - - X  -  y  + - Q  +  - 

yZ  y  y 


®33  “  y2 


ir^  f  ~  i  1  2Q 

+  l-  y  +  -  - 

6  I  y  J  y 


p 

+  -O 


p  -  (i+yr  r(i+2y)  , 


Q  -  r(2+y)  mi+y)  + 


l+y 


}  ■ 
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Here  r  and  ^  are  the  gamna  and  digamna  functions  and  y  is  Euler’s 
constant.  Hence  we  deduce 


var(q) 


f 

]c 


-1 


13 


(6.3) 


%dtere  has  entries  (»ij)  (i.e.  (6.2)  with  cp»1),  -  (D^^, 02,03)  and 


o 


1  * 


aq 

dn 


1, 


aq  e^-1 
da  y 


O 


3 


1  aq  n  vn 
- -  _  e'  <•  -  - y — . 

<r  ay  y  y^ 


(6.4) 


nte  values  of  a  and  y  are  here  taXen  to  be  those  appropriate  to  the 
approximation  of  P",  i.e.  a  -  a^j,  y  •*  y„  in  the  notation  of  Section  2. 


*rhe  bias  of  q  will  again  be  computed  by  enibedding  the  Generalised 
Extreme  Value  distribution  in  a  larger  family,  here  (2.S).  More 
precisely,  consider  the  extended  family 


G(y) 


y(y-u) 

a 


ri/r  , 

fy-p 

1 

1  1 

l| 

ffe.5) 


with  five  puameters  This  is  used  as  an  approximation  to 

P”(y),  with  n  -  b„,  cr  -  a„,  y  *  yn»  «  =  cg(b„).  As  in  Section  4,  it 
suffices  to  define  (6.S)  over  the  range  within  which  it  is  a  valid 
distribution  function,  setting  G(y)  to  be  O  or  1  outside  this  range. 


By  euialogy  with  (4.5),  the  biases  of  the  parameter  estimates  are 


given  by 


B 


A 

y  _  y 


aiog  g(Y) 
9n 

aiog  g(Y) 
da 

aiog  q(Y) 
ay 


(6.6) 
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%A)ere  M  is  the  Fisher  information  (6.2),  g  is  the  Ctonora  1i. sc Rxtreme 
Value  density  and  Y  is  distributed  according  to  (6.5).  The  main  task  is 
to  evaluate  these  expectations  in  the  limit  as  e  ■>  0. 

First,  note  that 

aicag  g(y)  ^  3^  f  ^  y(y-M)|~^  _  i  y-M)|~^'^'^^ 

dH  <t  j  <tI  aj 


aiog  g(y) 

-  fi  -  fi 

,  y<y-M)r^'^^ 

1  aiog  g(y) 

a<T 

<Ty  1  1 

a  1 

y  9^t■ 

aiog  g(y) 

ay 

1  1 

log 

y  y^ 

{. .  -->)  . 

1 

a  1 

a  aiog  g(y) 

y  da 

(6.7) 

feV2_i 

where  we  have  written  Hp  in  place  of  Hp| - 


Prom  the  aefinition  of  Hp  in  (2.9)  and  using  (S.7),  we  deduce 


e“®^  H, 


fe^-i 


p^(  p*-l  )y^ 


(1+s-py) 


-  (1+8)  +  (l+s)-H*‘>  (1+s+y)}  +  p(p+l>y  H^^^Vl+s) 


(6.9) 

When  cM),  all  the  e)q>res8ions  in  (6.8)  of  course  have  expectation  O, 
as  is  readily  verified  directly  from  (5.7).  We  therefore  evaluate 
expectations  of  the  0(tf)  terns  in  (6.8),  using  (6.9),  to  deduce,  as  e  ->  o. 


_fdlog  g(Y)l  « 
e[— - - 1  .  -  P<y,p)  , 

raiog  g(Y)l  «  p  1  , 


.faiog  g(Tr)l  f_  ,  Q(y.P)  .  P(y,p)l 

~  ‘r ■  “IT- "  -7-) ' 


(6.10) 


where 


1+y 

p(y.p)  - 

p’'(pfi  )r* 


(l-yp)r(l+7^p)  -  r(l+y) 


+  p2(r(l+y)-(l+y)r(l+2y))  +  p(p+l)y(r(l+y)+r*(l+y)) 


“  :3r,^Tx'^  [r(2-yp)-l  +  p2{i-r(2+y))  +  p(pn)y(l-^)]  . 
p  vP^i)y 


R(y,p)  -  -y 


p‘(pfl)y-' 


yp{r(l-yp)  +  pr(l+y)-p-i)  -  r'(2-py) 


2 

IF  *** 


+  l->^p*{l-r-r’(2+y))  -  p(p+l)y(-  -  2y+7) 

D 


(6.11) 
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We  have  used  the  relations  r'(l)  =  -y,  r'(2)  =  1-y,  r"(2)  =  rr^e  -  2y+y^. 


Equations  (6.6)  and  (6.10)  may  be  conibined  into 


H  -  II 


a  ~  a 

E  -  ~  e  S(y»p) 

a 


(6.12) 


y  -  y 


Where 


B<y.p)  “ 


P(  y*p> 


Q(y/P)  - 


P(y/P) 


Q(y.p)  P(y/P) 

R(y»p) - + - 

y  y 


being  as  in  (6.3). 


We  now  apply  these  results  to  calculate  the  bias  in  q.  From  (6.1) 
and  (6.12)  we  deduce 


~  ...  oe^-l  m 

E(q)  ~  p  + - +  ea  J5(y,p) 

y 


(6.13) 


where  Q  is  as  in  (6.4).  We  also  have,  from  the  inverse  of  (6.5),  that  the 
true  quantile  under  (6.5)  satisfies 


q  ~  p  +•  C -  +  «0  e^  H_  - 

y  y 


(6.14) 


Combining  (6.13)  and  (6.14)  we  have 


-  m  fe^^^l  1 

E(q-q)  ~  w  B<y»p)  -  ”p[~7~  ’ 


(6.15) 


These  aproximations  may  again  be  nuide  rigorous  using  arguments  similar  to 


those  in  previous  sections. 
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7.  COMPARISONS  OF  THE  METHODS 

The  results  o£  the  preceding  four  sections  may  be  applied  to  a  number 
of  questions  concerning  comparisons  among  the  methods.  Among  these 
questions  are:- 

1.  Choice  of  threshold  (threshold  methods)  or  of  blocic  size 
(classical  methods). 

2.  Choice  between  two-parameter  and  three-parameter  approximations, 
euisuming  y^^O. 

3.  Choice  between  the  threshold  and  classical  approaches. 

In  each  case,  a  meainingful  comparison  must  Lake  the  biaus  eis  well 
as  the  variance  term  into  account,  zus  otherwise  it  would  be  possible  to 
achieve  very  high  accuracy  by  taking  the  threshold  very  low  or  the  block 
size  very  small. 

Problem  1  has  been  considered  by  Pickands  (1975),  Hall  auid  welsh 
(1985)  euid  Smith  (1987)  in  the  threshold  case,  Cohen  (1987,  1988)  in  the 
classical  case.  Problem  2  wais  also  studied  by  Smith  (1987)  in  the 
threshold  case,  Cohen  (1988)  in  the  cleissical  case.  Problem  3  has  been 
considered  by  Cunneuie  ( 1973 )  and  Ftosbjerg  ( 1985 )  in  the  hydrology 
literature,  and  in  a  preprint  by  J.  Husler  euid  J.  Tiago  de  Oliveira,  but 
these  authors  have  considered  only  the  variance  of  the  estimators  and  have 
neglected  the  bias. 

To  illustrate  how  these  ideas  may  be  developed,  I  consider  here  the 
comparison  of  the  methods  of  Section  3  and  5,  assuming  O. 

Consider  first  the  threshold  model  with  exponential  exceedances. 
Assume  the  total  sample  size  N  and  the  desired  tail  probability  p  are 
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fixed.  Then  equation  (3.6)  may  be  written  in  the  form 

MSE(q)  ~  +  y^Tj^  ~  2I  ]  (7.1) 

whore  k  «  N©  denotes  the  expected  number  of  exceedances  of  the  threshold  u 
and  ^«<Ku),  y“4>'(u),  t)  =  -log{Np/k). 


Similarly,  under  the  Gumbel  model  of  Section  S,  equations  (5.5)  and 
(5.9)  lead  to 


MSE(q) 


O'* 


ri.  10866  +  0.51404T)  +  0.60793n^ 

L  Ic 

+  y^( 0.54205  +  0.30798n  -  0.50^)^ 


(7.2) 


where  ^  ^  ^  *  ~iog(NpA).  n  «  U/Tc  and  b^  satisfies 

-log  r(h^)  -  n"^. 


If  k  is  fixed  then  u  in  (7.1)  ^u1d  b^  in  (7.2)  are  (aloiost)  the  same; 
hence  so  aire  4>  and  y  in  the  two  equations.  Horeover,  over  the  range  of 
values  of  k  vAiich  are  of  interest,  both  ^  and  y  vary  only  slightly,  so  we 
may  effectively  treat  these  as  constants.  (Precise  justification  of  this 
last  stat«nent  will  not  be  made,  but  the  key  point  is  that 
(Ku4y^u))/^u)->l,  ♦'(u+y^(u)/4>'(u)-»l,  as  u4x*  for  fixed  y.  These 

properties  follow  from  the  assunptions  made  in  section  2). 


It  follows,  then,  that  we  may  conpare  the  two  procedures  by  directly 
comparing  the  eiq>ressions  (7.1)  and  (7.2),  treating  k  as  a  free  parameter. 
Ttie  conparison  depends  on  N  and  p  only  through  the  product  Np.  As  an 
exanple.  Figure  1  shows  the  t%ro  mean  squared  errors  plotted  agad.nst  k  for 
Np^l,  yO.l,  ^1.  nie  minimum  values  are  0.494  for  the  Gumbel  procedure 
at  k«*23  ,  0.462  for  the  threshold  procedure  at  k-42.  nius  the  optimal  k  is 
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aljnost  twice  as  large  for  the  threshold  procedure  as  for  the  Gundbel 
procedure,  auid  the  ratio  of  minimuin  mean  squared  errors  is  1.07  in  favour 
of  the  threshold  procedure.  In  comparative  terms,  very  simileu:  results 
were  obtained  for  other  values  of  Np  and  y  tdiich  were  tried. 

In  the  papers  cited  earlier,  the  cooqparison  between  the  two 
procedures  %«as  bemed  on  variance  alone,  under  the  assun^tion  that  the  same 
Jc  is  used  in  each.  ntere  is  no  re^uson,  however,  to  make  such  an 
assuBiption .  The  present  study  thus  favours  the  threshold  method  so  long 
as  k  is  chosen  c^timally  (or  nearly  optimally),  though  in  view  of  the 
asys^totic  nature  of  the  result  and  the  fairly  small  differences  between 
the  procedures  it  would  be  wrong  to  rezid  too  much  into  this  conclusion. 
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8 .  COMPUTATIONAL  RESULTS . 

■nils  section  describes  numerical  comparisons  of  all  four  procedures 
for  a  number  of  parent  distributions. 

Simulation  studies  of  similar  questions  have  previously  been 
published  by  Boos  ( 1984 ) ,  Gomes  ( 1986 )  and  Joe  ( 1987 ) .  Boos  compared  the 
eiqponential  method  of  estimating  extreme  quantiles  with  the  nonpareunetric 
quantile  estimates,  for  a  variety  of  values  of  k.  Gomes  made  a  comparison 
of  the  Gumbel  and  Generalised  Extreme  Value  distributions  for  estimating 
extreme  quantiles  in  the  classical  approach  to  extreme  value  theory.  When 
the  limiting  (ultimate)  approximation  is  Gumbel.  Her  results  generally 
support  the  use  of  the  Generalised  Extreme  Value  distribution,  especially 
for  estimating  the  more  extreme  quantiles.  Joe  made  a  nuadber  of 
comparisons  of  bias  and  mean  square  error  for  all  four  procedures  studied 
in  this  paper.  Joe  concluded  that,  in  general,  estimation  based  on  the 
Generalised  Pareto  distribution  is  slightly  superior  to  that  based  on  the 
Generalised  Extreme  value  distribution,  but  that,  in  all  cases  it  is 
importeuit  not  to  take  k  too  large. 

Some  attempt  that  has  been  made  to  rnproduce  the  msu  l.ts  of  these 
three  papers  using  the  approximations  of  Sections  3-6.  Hy  approximations 
support  their  broad  conclusions  but  do  not  reproduce  their  detailed 
numerical  results.  This  is  probably  because  the  sample  sises  are  too 
small  for  the  <Mymptotic  results  to  be  reasonable.  For  instance,  Joe 
assumed  total  sample  sise  N-600  of  Which  a  typical  run  was  based  on  k«30 
blocks  of  size  n<-20.  In  the  following  discussion  I  assume  a  san^le  size 
N»10000.  Although  this  is  much  larger  than  the  sample  sizes  in  the  earlier 
studies  it  is  not  at  all  unreaisonable  for  many  applications  in  the 
hydrology/mctcjTOlogy  areas. 
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Figure  2  plots  the  root  meem  squared  error  for  the  estimator  of  the 
I/lOOOO-quantile,  (i.e.  based  on  N-lOOOO  observations,  calculated 

using  the  approximations  in  Sections  3-6,  for  various  parent  distributions 
and  across  a  wide  range  of  k.  Corresponding  calculations  were  also  made 
for  other  quamtiles  with  generally  similar  results. 

One  property  that  is  sometimes  observed  with  these  calculations  is  a 
'*bias-cancellation**  phenomenon  -  the  biats  becomes  zero  owing  to  a 
cancellation  of  the  terms  contributing  to  it.  Ibis  is  observed  in 
Pig. 2(a)  which  is  based  on  the  standard  normal  distribution  for  the 
observations  from  nfhich  the  sample  is  drawn,  ibe  bias  for  the  Generalised 
Pareto  method  is  zero  near  k«2100  and  that  for  the  Generalised  Extreme 
Value  method  neair  k>230O.  Ibis  results  in  an  unusual  shape  of  the  tvfo 
curves,  with  the  acqparently  optimal  k  very  large.  Prom  a  practical  point 
of  view  it  %KHild  be  unwise  to  rely  on  being  able  to  exploit  the  bias 
cancellation  and  the  most  significeuit  feature  of  Pig. 2(a)  is  that  the 
Generalised  Pareto  method  does  better  them  the  Generalised  Extreme  Value 
method  for  most  of  the  range  of  k .  'Rie  other  two  methods  are  not  even 
shown  because  their  mean  squared  errors  are  far  lauger.  In  contrast. 
Fig. 2(b)  shoifs  the  four  plots  for  the  lognormal  distribution  (a-1).  In 
this  case  the  bias  comparisons  again  favour  the  Generalised 
Pareto/Generalised  Extreme  Value  procedures,  but  the  variances  are  much 
lower  for  the  exponential/Cumbel  procedures,  with  the  exponential  coming 
off  best.  Pig. 2(c)  shows  a  gamma  distribution  (scale  parameter  i,  shape 
paraaMter  5)  reflected  about  the  origin  -  this  comparison  would  also  be 
valid  for  inference  about  the  lower  tail  when  the  paurent  distribution  is  a 
three-parameter  gamma.  1!be  exponential  and  Gumbel  procedures  are  not 
applicable  here  because  the  reflected  gazma  distribution  is  not  in  the 
Gumbel  domain  of  attraction.  In  this  case  the  Generalised  Pareto  method 
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does  better  for  large  k,  the  con^rison  being  rather  similar  to  Fig.l.  A 
conqparison  for  the  upper  tail  of  a  t^.  distribution  (Fig. 2(d))  leeuls  to 
rather  similar  conclusions  except  for  a  rather  drastic  bias  czuncellation 
effect  at  the  right  hand  side.  Finally,  the  Weibull  distribution  function 
1  -  eicp{-(n//3)®)  was  tried,  with  a-O.5,  p^.X  in  Fig. 2(e),  a-l.5,  /3-1  in 
Fig. 2(f).  Fig.2(e)  shows  the  Gunibel  and  exponential  procedures  dominant, 
lugely  because  the  variances  again  dominate  the  coo^rison.  In  contrast. 
Fig. 2(f)  shows  the  Generalised  Pareto/Generalised  Extreme  Value  procedures 
dominant.  In  comparing  these  tii#o  Weil>ull  distributions,  it  may  well  be 
important  that  the  a^.5  case  is  heavier-tailed  than  an  exponential 
distribution  2Uid  in  this  respect  conqparable  with  the  lognormal,  vrt\ile  the 
op-l.S  cue  is  lighter-tailed  than  the  exponential  distribution  and 
therefore  comparable  with  the  normal.  The  clusification  of  distributions 
into  lighter  than  exponential,  ai^roxiinately  exponential  and  heavier  than 
exponential  tails  was  also  made  by  Boos  (1984),  who  referred  to  earlier 
unpublished  work  by  Breiman,  Stone  and  Gins. 

Summarising  the  results  so  far,  the  following  general  observations 
may  be  oubdei 

1.  In  cases  in  the  domain  of  attraction  of  a  Gunbel  distribution  the 
Generalised  Pareto/Generalised  Extreme  Value  procedures  perform  better 
than  the  e]qx>nential/<>uiiibel  procedures  when  the  tail  is  lighter  than 
exponential,  but  the  comparison  appears  to  be  reversed  when  the  tall  is 
heavier  than  exponential.  Also  in  the  "heavier  than  exponential"  c2tBe  the 
optimal  k  is  much  smaller. 

2 .  In  all  cases  the  comparison  bet%Men  the  threshold  and 
corresponding  classical  procedure  appears  to  be  similar  to  Fig.l,  i.e.  the 
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classical  procedure  does  better  at  small  k  but  the  threshold  procedure 
achieves  its  minimum  mean  squared  error  at  a  leurger  value  of  k  and  is  then 
superior.  The  exceptions  to  this  are  %dien  bias  czmcellation  is  observed. 

Gomes  ( 1986 )  coii%>ared  the  Gumbel  and  Generalised  Extreme  Value 
methods  by  simulation/  remarking  that  the  Generalised  Extreme  Value  method 
seems  to  do  better  as  higher  quantiles  are  estimated.  However,  both  the 
distributions  she  used  for  simulation  (normal,  and  a  modified  form  of 
Weibull  with  a>«4)  sure  lighter-tailed  than  e3q)onential  so  the  present  study 
suggests  her  simulations  were  not  extensive  enough  to  support  those 
conclusions.  Boos  (1984)  also  remarked  on  the  difficulties  of  the 
heavier-than-exponenti2d.  case,  even  suggesting  that  si]ic>le  nonparametric 
estimators  might  do  better  in  such  cases.  Another  simulation  study, 
though  not  directly  treating  the  bias  vs.  variance  aspect  of  the  problem, 
led  Hosking,  Wallis  and  Wood  (1985)  to  conclude  that  maximum  likelihood 
estimation  has  poor  san^ling  properties  in  the  heavy-tailed  (y>0)  case, 
2uid  to  propose  the  method  of  probability  weighted  moments  as  an 
alternative.  All  these  studies  point  to  the  need  for  a  more  detailed 
theoretical  study  of  the  heavy-tailed  case,  including  perhaps  the 
development  of  alternative  estimators  with  smaller  mean  squared  errors 
than  maximum  likelihood. 

So  far  no  indication  has  been  given  of  the  accuracy  of  the  proposed 
approximations  to  the  bias.  A  theoretical  way  to  assess  this  is  as 
follows.  Suppose  we  have  a  very  large  sample  from  p"  for  a  given  finite 
n,  or  from  a  threshold  distribution  for  fixed  threshold.  The  peurameters 
of  the  fitted  model  (respectively.  Generalised  Extreme  or  Generalised 
Pareto)  will  converge  to  those  values  which  maximise  the  e]q)ected  log 
likelihood  of  the  fitted  model  under  the  true  distribution.  These 
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limiting  pareoneters  may  be  calculated  by  a  combination  of  numerical 

integration  zmd  numerical  optimisation,  and  result  in  exact  expressions 
for  the  bias  -  "exact"  in  the  sense  that  they  do  not  rely  on  any 

approximations  for  the  distribution  function.  (Obey  are  still  only 
approximations  for  finite  k. )  In  Table  l  this  is  calculated  for  nolOO  or 

500  and  the  distributions  used  in  Figure  2.  Three  approximations  to  y  are 

shown:  the  crude  approximation  <t>*(u)  of  Section  2,  the  "bias  corrected" 
approximation  obtained  from  (4.7)  or  (6.12)  and  the  "exact"  value  given  by 
the  procedure  just  described.  Also  shown  are  biases  for  t%ra  quantile 
estimates  where,  for  M-iOOOO,  corresponds  to  p>-000l  (as  in  Figure  2) 
and  q2  to  p^-ooos.  These  are  expressed  eis  a  percentage  of  the  absolute 
value  of  the  quantity  being  estimated.  These  results  show  that,  although 
in  most  cases  our  approximations  are  the  right  order  of  magnitude,  one 
would  have  to  go  to  even  larger  samples  before  they  were  really  accurate. 
In  one  sense  this  does  not  matter,  because  in  practice  we  would  not  have 
the  information  to  obtain  the  exact  biases  anyway,  so  a  more  important 
feature  of  our  results  is  that  they  lead  to  the  right  qualitative 
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9.  SUMMARY  AND  CONCLUSIONS 

'Hie  main  purpose  of  the  paper  hats  been  to  obtain  ai^roximate 
expressions  for  the  bias  and  variance  of  four  established  extreme-value 
procedures,  and  to  use  these  to  study  the  optimal  value  of  k  and  to 
compare  the  four  procedures  in  terms  of  mean  squared  error.  The  studies 
have  in  general  supported  the  qualitative  conclusions  of  simulation 
studies  by  other  authors  auid  have  also  suggested  some  new  atspects.  In 
particular,  threshold  procedures  tend  to  require  a  larger  optimal  k  than 
classical  procedures,  and  then  to  achieve  a  lower  mean  squared  error. 
Concerning  the  cosqparison  between  exponential/Gundoel  procedures  on  the  one 
hamd,  and  Generalised  Pareto/  Generalised  Extreme  value  procedures  on  the 
other,  in  many  cases  the  study  supports  the  latter,  but  not  in 
heavy-tailed  cases  in  the  Gumbel  domain  of  attraction  ( lognormal,  Weibull 
with  a<l). 

Important  open  questions  are: 

1.  Data-beised  (adaptive)  estimation  of  k.  This  may  be  best  achieved  by 
trying  to  estimate  the  bias,  and  then  choosing  k  to  minimise  the  estimated 
mean  squared  error.  It  seems  to  me  that  "general”  methods  of  bias 
estixiation,  such  as  the  jackknife  and  bootstrap,  are  unlikely  to  work  in 
their  usual  form,  in  view  of  the  rather  specialised  nature  of  the  bias 
probl«n,  but  some  xodifications  to  take  account  of  this  may  be  possible. 
An  asynv>totically  efficient,  but  hi^ly  artificial,  proposal  along  these 
lines  was  made  by  Hall  and  Welsh  (1985),  in  a  more  restricted  setting  than 
the  one  considered  here.  Hie  Hall-Welsh  result  does  serve  to  indicate 
idiat  is  theoretically  possible. 

2.  Are  there  better  estimation  procedures  than  maximum  likelihood?  Hie 
well-known  asymptotic  optimality  of  maximum  likelihood  may  not  apply  when 
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bieis  is  taken  into  account,  and  indeed  Csorgo,  Deheuvels  and  Mason  (1985) 
have  indicated  one  vay  to  in^rove  it,  though  again  in  the  more  restricted 
setting  of  estimating  the  Pareto  index.  However,  there  may  be  sii<c>ler 
estimates  than  theirs  which  would  have  better  properties  than  nuuciraum 
likelihood,  the  probability-weighted  moments  estimator  being  presumably  a 
candidate.  theoretical  investigation  of  other  estimators  is  possible 
along  the  same  lines  as  developed  for  meucimum  likelihood  estimators  in 
this  paper. 

3.  Another  possibility  is  to  expemd  the  class  of  model  distributions  to 
something  wider  than  the  Generaaised  Extreme  Value  of  Generalised  Pareto 
claisses.  there  is  considerable  discussion  along  these  lines  in  parts  of 
the  hydrology  literature,  the  theoretical  status  of  vhich  is  ill-defined, 
but  there  are  obvious  possibilities  such  as  conbining  extreme  value 
distributions  with  a  family  of  transformations.  Since  this  would 
inevitably  involve  Increasing  the  varieuice  of  the  estimators  (because  of 
the  extra  paurameters),  the  only  theoretical  way  to  aissess  the  idea  would 
be  in  terms  of  some  form  of  trade-off  between  biaus  and  variamce. 
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TABLE  1  :  EXACT  BIAS  CALCULATIONS 
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PARENT 

DIST. 

METHOD 

N 

ESTIMATED  y 

CRUDE  BC 

Normal 

GEV 

100 

-.127 

-.095 

Normal 

GPD 

100 

-.127 

-.074 

Lognormal 

(1) 

GEV 

100 

.248 

.224 

Lognormal 

(1) 

GPD 

100 

.248 

.200 

Refl.Gamna 

<5) 

GEV 

100 

-.315 

-.275 

Refl.Gamna 

(5) 

GPD 

100 

-.315 

-.267 

*4 

GEV 

100 

.196 

.212 

^4 

GPD 

100 

.196 

.231 

tfeibull 
(0.5, 0.1) 

GEV 

500 

.161 

.129 

Welbull 
(0.5,0. 1) 

GPD 

500 

.161 

.099 

welbull 
(1.5, 1.0) 

GEV 

100 

-.072 

-.055 

WeilHill 
(1.5, 1.0) 

GPD 

lOO 

-.072 

-.043 

PERCENT  PERCENT 

BIAS  IN  BIAS  IN 


EXACT 

APPROX 

EXACT 

APPROX 

EXACP 

-.101 

-  .17 

-  .16 

.40 

.26 

-.089 

.  16 

.06 

.31 

.20 

.239 

3.15 

2.11 

-  .60 

.03 

.222 

.56 

.43 

-1.10 

-.31 

-.266 

1.21 

3.93 

1.60 

2.38 

-.258 

1.21 

2.06 

.85 

1.09 

.211 

-2.08 

-1.87 

-  .03 

-.09 

.225 

-  .46 

-  .50 

.27 

.18 

.145 

-  .74 

-  .10 

-  .70 

-.41 

.127 

-1.11 

-  .33 

-  .14 

-.08 

-.057 

-  .17 

-  .13 

.22 

.11 

-.053 

.07 

.00 

.18 

.10 
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